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The  lASSIST  QUARTERLY  represents  an  intemationa]  cooperative 
effort  on  the  part  of  individuals  managing,  operating,  or  using  machine- 
readable  data  archives,  data  libraries,  and  data  services  The 
QUARTERLY  reports  on  activities  related  to  the  production,  acquisition, 
preservation,  processing,  distribution,  and  use  of  machine-readable  data 
earned  out  by  its  members  and  others  in  the  international  social  science 
community,  'lour  contnbuuons  and  suggestions  for  topics  of  interest  are 
welcomed.  The  views  set  forth  by  authors  of  articles  contained  in  this 
publication  are  not  necessanly  those  of  lASSIST. 

Information  for  Authors 

The  QUARTERLY  is  published  four  Umes  per  year.  Articles  and  other 
information  should  be  typewntten  and  double-spaced  liach  page  of  the 
manuscnpt  should  be  numbered.  The  first  page  should  contain  the  article 
title,  author's  name,  affiliation,  address  to  which  correspondence  may  be 
sent,  and  telephone  number   Footnotes  and  bibliographic  atations  should 
be  consistent  in  style,  preferably  following  a  standard  authonty  such  as 
the  University  of  C?hicago  press  Manual  of  Style  or  Kate  L  Turabian's 
Manual  for  Writers  Where  appropnate,  machine-readable  data  files 
should  be  cited  with  bibliographic  citations  consistent  in  style  with  Dodd, 
Sue  A  "Bibliographic  references  for  numenc  soaal  science  data  files: 
suggested  guidelines"  Journal  of  the  American  Society  for  Information 
Science  30l2):77-82.  March  1 979    If  the  contribution  is  an  announcement 
of  a  conference,  U^mng  session,  or  the  like,  the  text  should  include  a 
mailing  address  and  a  telephone  number  for  the  director  of  the  event  or  for 
the  orgaruzation  sponsonng  the  event    Book  notices  and  reviews  should 
not  e.^ceed  two  double-spaced  pages.  Deadlines  for  submilUng  articles 
are  six  weeks  before  publication    Nlanuscnpts  should  be  sent  in  duplicate 
to  the  Editor  Laura  Bartolo,  Ijbranes  &  Media  Services,  Kent  State 
L'niversity,  Kent,  Ohio  44242  (216)  672-3024.  Email 
LBARTOLO@KENTVMKENT  EDI'    Book  reviews  should  be 
submitted  in  duplicate  to  the  Book  Review  Editor  Daniel  Tsang,  Main 
Library,  Lniversity  of  California  PO.  Box  19557,  Irvine,  California 
92713'l'SA.  (714)  856-4978  E-Mail:  DTSANG@ORIONCT.UCI.ED 
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Facilitating  Access  to  Comparative  Data 

b\  Ekkehard  Mochtnann  &  Lorenz  Graf, 

Central  Archive  for  Empirical  Social  Research  (ZA); 

at  the  University  of  Cologne 


Mandate  of  the  ZA  as  part  of  a  Social  Science  Infrastructure 

The  Centra]  Archive  for  Empmcal  Social  Research  (ZA)  at  the  University  of  Cologne  serves  as  a  research,  training  and 
resource  center  for  social  research.  Founded  in  1960  by  the  Faculty  of  Economics  and  Social  Sciences  of  the  University  of 
Cologne,  It  soon  developed  into  a  data  service  with  a  supraregional  and  international  clientele.  As  a  central  node  in  the 
international  data  ser%  ice  network  it  became  the  starting  point  for  a  more  comprehensive  social  science  infrastructure,  the 
German  Social  Science  Infrastructure  Services  (GESIS  e.  V.).  This  association  was  created  as  a  response  to  needs  formulated 
by  the  social  science  profession  in  1986  to  provide  infrastructural  services  in  all  fields  of  social  research  with  particular 
emphasis  on: 

•  collecting  data  and  making  it  available  for  further  research. 

•  informing  about  scx;ial  science  literature  and  research  projects. 

•  development  of  research  methods,  teaching  instruments  and  methods  consulting  for  research  projects. 

The  core  of  the  ZA  mandate  is  to  facilitate  access  to  already  existing  data,  especially  survey  data,  which  can  be  used  for 
secondary  analysis.  The  holdings  coverall  fields  of  empmcal  social  research.  Beyond  suney  data  there  are  collections  of 
statistical  data,  regional  data,  various  types  of  quantitative  histoncal  data  and  machine  readable  texts  for  computer  assisted 
content  analysis,  as  well  as  fjarty  manifestos  and  other  text  collections. 

ZA  provides  services  in  the  area  of  acquisition,  processing,  documenting  and  making  available  data  for  social  research, 
especially  survey  data.  ZA  offers  consulting  services  for  secondarv  analysis.  Training  in  complex  analysis  methods  takes 
place  twice  a  year  in  the  ZA  spnng  seminar  for  empirical  social  research  and  the  autumn  seminar  for  quantitative  historical 
research.  Beyond  this,  ZA  creates  ex  post  statistical  time  senes  and  supports  comparative  international  studies  for  the  analysis 
of  long  term  social  developments  . 

ZA  holdings  of  empirical  social  research  data  include  European  time  series  and  comparative  studies.  The  ZA  department 
ZHSF  (Center  for  Histoncal  Social  Research)  develops  data  bases,  in  some  cases  going  back  to  earlier  centunes.  The  ZA 
holds  nearly  4000  data  sets  and  data  collections.  Even  though  there  is  no  particular  topical  restnction,  emphasis  is  on  topics 
such  as  political  attitudes,  election  studies,  education,  unemployment,  leisure  and  occupation,  media  and  the  environment. 

Among  the  data  sets  intensively  used  are  the  EUROBAROMETERs  (a  data  pool  of  comparative  surveys  from  European 
countries  taken  for  more  than  15  years),  the  German  General  Social  Survey  ALLBUS,  which  is  conducted  ever\  two  years, 
the  International  S(xial  Survey  Program  (ISSP)  for  25  countnes  from  Australia,  Amenca,  Europe  to  Japan.  Similar  attention 
is  paid  to  the  monthly  POLITBAROMETER  series  pro\  ided  by  the  Research  Group  Elections  (FGW:  Forschungsgruppe 
Wahlen)  which  is  also  pre.sented  on  the  second  public  TV  station  (ZDF)  every  month  and  the  collection  of  surveys  to  the 
national  parliament  (Bundestag)  since  1949, 

A  GESIS  branch  in  Berlin  is  now  focusing  on  data  and  information  transfer  from  and  to  Eastern  Europe.  Recenth  more  than 
400  data  sets  from  surveys  conducted  in  the  former  German  Demtx;raUc  Republic  (GDR)  since  1975,  were  included  in  the 
ZA  holdings  and  were  processed  for  secondary  analyses.  Currently  emphasis  is  on  supporting  initiatives  to  create 
infraislructure  institutes  in  Eastern  Europe  and  to  develop  a  service  network  for  European  w ide  data  transfer. 

The  ZA  has  access  to  data  held  in  the  social  science  data  archives  world  wide.  International  data  transfer  is  ctx)rdinated  with 
the  Council  of  European  Scxrial  Science  Data  Archives  (CESSDA )  and  the  International  Federation  of  Data  OrganizaUons  for 
the  Social  Sciences  (IFDO).  Access  to  internationally  distnbuted  data  bases  is  supported  b\  making  use  of  modem  telematic 
services  like  WAIS,  WWW,  FTP  on  the  INTERNET  and  other  computer  networks. 

Selecting  relevant  data  and  solving  methodological  problems  relating  to  secondary  analysis  is  an  essential  part  of  individual 
consulting.  The  newsletter  ZA  INFORMATION  and  the  journal  Histoncal  Social  Research  (HSR)  inform  about  new  data 
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sets,  methodological  developments,  research  findings  and  conferences.  A  documentation  of  more  than  1000  empirical 
research  projects  conducted  m  Germany,  Austna  and  Switzerland  is  published  annually. 

Organizational  priorities  in  collecting  and  distributing  data 

From  the  very  beginning  the  ZA  philosophy  was  to  develop  services  in  close  interacuon  with  the  scientific  community.  As  a 
consequence  of  this  philosophy  'LK  also  supports  a  small  research  and  training  department  which  focuses  on  new 
methodological  de\  elopments  in  data  collection  and  analysis.  Under  a  guest  professor  scheme  scholars  from  abroad  are 
supported  in  their  research  from  planning  new  surveys  to  secondar\  analysis  of  available  data.  Experts  in  data  management 
and  analysis  offer  ad\'ice  from  the  selection  of  appropnate  data  to  ad\anced  statistical  anal\sis. 

Already  in  the  60s  Erwin  K.  Scheuch,  one  of  the  ZA  founding  fathers,  created  a  climate  for  comparative  research  which  was 
inspired  by  the  Standing  Committee  for  Comparative  Research  of  the  International  Social  Science  Council,  in  which  he 
cooperated  with  Stein  Rokkan  and  Warren  Miller.  This  onentation  was  enforced  by  the  emerging  European  UnificaUon  and 
the  globalization  of  social  research. 

Cher  the  years  several  international  research  projects  have  chosen  the  ZA  as  their  resource  center  for  creating  an  integrated 
data  bases.  Integrating  national  data  sets  into  intemationall>  comparative  data  sets  includes  comprehensne  documentation  of 
methodological,  technical  and  histoncal  background  of  a  study  and  additional  interpretation  knowledge  to  facilitate  further 
comparative  analysis.  Currently  ZA  serves  in  this  function  for  the  EUROBAROMETERS  (jointly  with  ICPSR  and  Swedish 
Data  Services  (SSD),  the  International  Social  Survey  Program  (ISSP),  and  the  major  election  studies  to  national  parliaments 
in  Europe  (ICORE)^  Bnnging  together  researchers  working  on  the  data  and  the  data  management  experience  of  ZA  provides 
a  umque  working  environment  for  creating  an  integrated  fund  of  knowledge  on  core  topics  of  European  social  development 
In  cooperation  with  the  principal  investigators  and  other  European  data  services  ZA  coordinates  and  creates  European  data 
bases,  which  could  otherwise  not  be  made  available  to  the  scientific  commimity,  relying  just  on  national  resources. 

ZA  strongly  supports  a  policy  of  labor  division  between  European  archives  according  to  topically  focused  European  data 
collections.  Under  tightening  resources  this  is  a  must  for  integrating  the  European  data  bases.  Inspite  of  intellectual  and 
political  efforts  there  is  an  ongoing  demand  for  additional  European  resources  to  achieve  what  cannot  be  covered  by  the 
subsidianty  principle:  the  data  service  capacities  are  by  and  large  absorbed  by  the  national  demands  and  there  is  little 
leverage  to  cof)e  with  additional  international  workloads. 

Direct  access  to  the  exfjertise  and  information  banks  on  social  science  literature  and  research  projects,  as  well  as  to  the 
methodological  expertise  of  its  GESIS  partner  institutes  complement  this  infrastructural  support  for  the  production  and 
analysis  of  comparative  data  bases  on  Europe. 

The  ZA  User  Survey 

Although  the  central  archive  has  always  made  efforts  to  communicate  with  its  clientele  we  have  found  it  necessary  to  get 
more  information  about  our  clientele  to  face  the  rapid  technological  changes  which  are  taking  place.  In  the  past,  information 
concerning  needs  and  demands  of  the  clientele  were  mostly  gathered  by  mail  surveys.  This  procedure  involves  three  major 
drawbacks.  First,  onl>  the  users  of  the  institution  are  suneyed.  So  we  would  miss  the  comments  of  those  researchers  who  did 
not  make  use  of  the  data  serv  ices.  Second,  the  findings  are  often  biased  because  only  the  most  motiv  ated  people  contribute  in 
these  surveys.  Third,  the  response  rates  in  mail  surveys  are  low .  The  installation  of  a  laboratorv'  for  telephone  surveys  at  the 
University  of  Cologne  last  autumn  gave  us  the  opportumty  to  av  oid  these  draw  backs.  In  a  pilot  study  to  test  this  telephone 
facility  we  could  inteniew  social  researchers  about  their  research  env ironment  and  about  their  impression  of  the  central 
archive. 

Description  of  the  sampling  procedure 

The  target  population  of  this  study  were  all  social  scientists  engaged  in  empirical  research.  For  this  purpose  we  defined 
empirical  social  research  as  a  quantitative  approach  w  hich  is  done  w  ith  the  methods  of  empincal  social  research,  mainl> 
interviewing,  observation  and  content  or  document  analysis  (cf.  Obershall  1972)  Since  there  is  no  list  of  saentists  using  this 
verv  approach  in  their  research,  we  could  hav e  started  our  project  w iih  a  list  of  institutions  know n  to  us  as  informants  for  our 
documentation.  But  this  procedure  would  have  led  to  some  sort  of  snowball  sample  resulting  in  an  unpredictable  sample 
structure.  Furthermore,  we  wanted  to  interv lew  even  those  people  w ho  do  social  research  but  do  not  want  to  appear  in  our 
documentation  so  they  do  not  inform  us  about  their  worL  Eventually,  we  came  to  the  conclusion  that  a  sample  drawn  out  of 
the  subscnbers  of  the  ZA  Newsletter  would  suit  our  needs  best,  for  they  may  be  assumed  to  be  highly  interested  in  the 
application  of  the  methods  of  social  research.  It  was  equally  important  for  our  purpose  that  fifty  percent  of  the  ZA  Users  were 
also  subscnbers  to  the  Newsletter.  Sampling  under  the  subscnbers  of  the  ZA  New  sletter  gave  us  the  opportunity  to  get  the 
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feedback  of  those  who  had  already  made  use  of  the  services  of  the  Central  Archive,  and  of  potential  users  as  well. 

We  planned  to  collect  about  500  inten iews.  In  ad\ ance,  we  estimated  that  about  25%  of  the  subscribers  were  not  directly 
involved  m  empirical  research,  e.g.  librarians  or  local  staff  of  university  computer  centers.  From  over  2,500  subscribers  of  the 
lA  Newsletter  we  drew  a  sample  of  1375  people.  Since  we  only  had  their  addresses  we  had  to  find  out  their  phone  numbers. 
Using  a  telephone  directory  CD-ROM  and  directory  inquines  we  managed  to  locate  more  than  1,100  potential  respondents. 
The  sur\'ey  took  place  between  Now  28  and  Dec.  5,  1995.  We  conducted  over  700  interviews.  More  than  200  respondents 
were  not  engaged  in  empincal  social  research  so  we  finished  with  a  sample  of  538  social  researchers.  Figure  1  shows  the 
details  of  the  sampling  procedure. 

The  interview  s  consisted  of  three  parts.  The  first  part  dealt  with  the  institutional  affiliation  of  the  researcher.  The  descnption 
of  the  actual  empincal  work  formed  the  content  of  the  second  part  and  finally  the  respondents  were  asked  questions 
concerning  the  performance  of  the  Central  Archive.  In  this  paper  we  will  focus  on  the  descnption  of  the  research  community 
and  the  ZA  clientele. 

Characteristics  of  the  ZA  Clientele 

Empincal  social  research  is  done  in  a  \  anety  of  disciplines.  The  readers  of  the  ZA  Newsletter  are  heavily  inclined  to 
sociology  as  shown  in  figure  2.  Two  in  five  researchers  (38.0??^)  belong  to  an  institute  which  is  situated  in  the  field  of 
sociology.  The  rele\'ance  of  sociology  is  outstanding.  It  is  mentioned  nearly  three  times  as  often  than  is  psychology  (13.3%), 
which  ranges  second.  Ne.xt  follows  a  group  of  three  fields  with  a  proportion  of  ten  percent  each;  Economics,  Poliucal  Science 
and  Education.  These  five  subjects  together  form  the  core  of  the  Social  Sciences.  Medicine,  Communication  Studies  and 
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Market  Research  gain  nearly  5'7r  each.  Statistics  ranges  last  in  this  list  of  disciplines.  There  were  27  other  subject  fields 
named  by  the  respondents,  like  Geography,  History ,  Cnminology,  Social  Psychology  etc.  But  none  gained  more  than  rv\o 
percent. 

Let  us  now  focus  on  those  inten  lewees  who  have  formerly  received  data  .sets  from  the  Central  Archive.  Among  those  people 
ncarl\  SC^J  belong  to  an  mstitute  of  the  research  field  of  Sociology.  The  second  most  important  discipline  is  Political 
Science.  Psychology  w  hich  was  second  among  the  readership  of  the  ZA  Newsletter  now  follows  in  the  fourth  position.  This 
IS  due  to  the  fact  that  psychologists  do  not  deal  that  much  with  suney  data.  They  prefer  expenmentaJ  data  mosdy  collected 
from  college  graduates.  The%  adopt  the  methcxis  of  empincal  research  but  they  nonnally  do  not  need  nationwide  suney  data. 
Political  scienUsts  on  the  other  hand  \  erv  often  kxik  for  election  data  or  data  concerning  the  nationw  idc  electorate.  So  it  is  not 
surpnsing  that  they  range  second  as  u.sers  of  the  ZA  data  service.  Ranging  third  among  the  users  of  the  ZA  Data  service  are 
researchers  belonging  to  institutions  in  the  field  of  Economics.  They  gain  a  proportion  of  nearly  ten  percent.  Psychology, 
ExJucation,  Communication  Saence  and  Medicine  gain  5%  each.  Obtaining  data  from  the  Central  Archive  is  of  less 
imptirtance  for  people  belonging  to  Market  Research  Institutes  and  to  the  Statistics  Branch.  The  former  do  not  care  much 
about  surveys  earned  out  by  other  scientists  and  the  statisticians  do  not  seem  to  be  in  particular  demand  of  survey  data 

As  shown  in  figure  3  two  thirds  of  the  users  of  the  CenO"aJ  Archive  work  in  an  academic  institutional  background.  20.7%  of 
the  respondents  are  employed  in  publicly  financed  research  institutes.  They  consist  mainly  of  federal  research  agencies,  like 
the  Bundesinstitut  fiir  Bildungsforschung,  or  go\  emmentally  financed  large  scale  institutes,  like  the  Ma.x-Planck-lnstitute  or 
the  Wissenschaftszenuaim  Berlin  fiir  Sozialforschung.  Only  a  small  number  of  them  are  private  nonprofit  organizations  like 
the  Konrad-Adenauer-Stiftung.  13.9%  of  the  readers  of  the  ZA  Newsletter  work  in  pnvate  organisations  in  the  commercial 
sector,  mainly  within  the  field  of  market  research.  There  are  only  slight  differences  in  the  percentage  between  the  researchers 
who  have  made  use  of  the  ZA  data  service  and  those  who  have  not.  While  emphasis  is  on  providing  senices  for  the  academic 
community,  the  clientele  also  includes  researchers  from  public  administi^tion  and  the  media.  In  the  ZA  User  Survey  people 
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who  work  in  the  media  are  underrepresented  to  some  extent.  They  normally  do  not  subscribe  to  the  ZA  Newsletter  because 
they  are  less  mterested  m  methological  issues.  The  survey  focused  on  people  who  are  actually  domg  social  research.  But  a 
remarkable  part  of  the  ZA  clientele  is  mainly  intenssted  in  getting  information  about  the  distnbution  of  attitudes  in  the 
population. 

For  what  purpose  do  the  clients  use  the  data? 

We  asked  those  persons  who  had  at  least  once  received  data  from  the  Cologne  Archne  w hat  purpose  the\  followed  in 
examining  the  data.  The  question  was  posed  as  open  ended  question  and  respondents  could  give  multiple  responses. 
Nevertheless  we  got  a  clear-cut  picture.  There  are  two  main  intentions  behind  the  ordenng  of  data.  One  third  of  the 
population  uses  the  data  as  a  source  for  secondarv'  analysis  under  a  new  research  question.  Another  third  employs  the  data  as 
a  supplement  to  own  data  sets.  This  completion  was  mostly  sought  in  ume  dimension.  In  most  cases  this  means  that 
researchers  w ho  have  alreadv'  got  data  at  the  present  stage  want  to  make  compansons  concerning  the  same  population  at  some 
former  point  in  time.  The  intention  to  conduct  an  international  or  intercultural  comparison  as  a  supplement  to  their  ow  n  data 
was  mentioned  by  a  smaller  fraction.  10.7%  of  the  researchers  used  the  data  for  teaching  purposes  in  class  and  6.4%  used  the 
data  in  order  to  evaluate  indicators  used  by  other  researchers.  Another  10%  named  other  intentions,  like  information  about 
the  distribution  of  certain  opimons  in  society,  compilation  of  dissertation  thesis  etc. 

Internet  as  a  Research  Tool 

In  the  near  future  the  already  heavy  use  of  the  internet  by  the  research  community  will  drastically  change  and  hopefully 
improve  the  conditions  for  doing  scientific  research.  Computer  mediated  commumcation  via  email  w  ill  expand  the  possibility 
to  collaborate  and  interact  w  ith  distant  colleagues.  The  flow  of  information  will  accelerate  and  increase  when  scientists  begin 
to  use  virtual  arenas  (multi  user  dungeons,  new  s  groups,  mailing  lists,  virtual  conferences,  electronic  journals  etc.)  to  discuss 
and  distribute  new  ideas.  With  more  scientists  using  the  internet  the  demand  for  quick  and  easy  access  to  socio-economic 
data  will  increase.  Therefore  it  is  \  ital  for  data  archives  to  know  how  man\  of  their  clients  have  access  to  the  internet  and  for 
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how  man\  of  them  it  has  already  become  an  ordinary  research  tcwl. 

In  November  1995  when  wc  asked  German  social  scientists  56.79?  of  them  had  direct  access  to  the  internet  from  their 
working  place  and  35.2%  made  frequent  use  of  it.  This  low  percentage  indicates  that  the  internet-revolution  has  still  to  gain 
ground  in  Gcrman\ .  Only  one  third  of  the  subscribers  of  the  ZA  Newsletter  have  adopted  the  internet  as  a  research  tool. 
Broken  down  into  institutional  affiliation  we  find  a  big  difference  between  researchers  working  in  the  academic  context  and 
those  who  work  outside  university.  Already  68.4%  of  the  respondents  belonging  to  uni\ ersm  institutes  have  access  to  the 
internet.  This  figure  is  nearly  tw  ice  as  high  as  in  non-academic  institutes.  In  pnvate  organisations  only  32.2%  of  the 
employees  dispose  of  a  direct  connection  to  the  internet.  In  govemmentally  financed  and  nonprofit  institutes  the  adoption  rate 
IS  higher  and  amounts  to  42.7%. 


Access 

Frequent  use 

Ratio 

(use/access) 

N 

University 

Government  /  Nonprofit 

industry 

No  institutionai  affiliation 

68.4 

42.7 
32.2 
33.3 

44.1 
26  1 
151 
26.7 

64.5 
61.1 
46.9 
80.2 

(320) 
(119) 
(73) 
(15) 

56.7 

35.2 

62.1 

(527) 

Table  1:  Access  and  use  of  the  internet  by  institutional  affiliation 

Access  to  the  internet  does  not  imply  that  scientists  make  use  of  the  internet  in  their  daily  work.  Only  two  thirds  of  the 
researchers  with  access  to  the  internet  adopt  an  internet  based  service  as  an  ordinary  research  tool.  There  is  still  a  lot  of 
hesitation  in  explonng  the  usefulness  of  the  internet.  The  ratio  of  use  to  access  of  the  internet  is  nearly  the  same  in  univ  ersity 
institutes  and  in  govemmentally  financed  institutes.  But  in  the  industrial  context  only  one  half  of  the  people  with  direct 
access  to  the  internet  make  frequent  use  of  email,  WWW,  FTP  or  some  other  internet  service.  The  situation  seems  even  worse 
if  we  look  at  the  percentage  of  scientists  in  the  industnal  context  using  the  internet.  Only  15. 1%  of  them  mention  the  internet 
as  a  useful  research  tool.  In  Germany,  at  the  time  of  our  survey,  the  internet  was  still  an  academic  challenge.  But  we  suppose 
that  the  low  adoption-rates  in  the  industry  sector  are  only  due  to  the  fact  that  the  internet  is  basically  an  academic  invention. 
With  a  Ume  lag  of  a  few  months  we  expect  that  researchers  in  non-academic  institutes  will  use  the  internet  with  the  same 
frequency  as  their  colleagues  in  university  institutes. 

It  IS  often  assumed  that  one  of  the  major  impacts  of  the  use  of  intemet-services  will  be  a  gap  between  young  and  skilled 
persons  who  adopt  the  new  technologies  quickly  and  older  [people  who  will  be  excluded  from  the  new  information 
technologies  (cf.  Negropontc  1995).  In  our  study  we  do  not  find  support  for  the  thesis  of  a  widening  gap  between  the 
generations.  We  do  find  differences  between  young  and  older  scientists  in  the  access-rates  but  there  are  no  differences  in 
adoption-rates.  Since  the  internet  has  not  yet  amved  in  non-academic  institutes  we  confine  the  analysis  of  this  thesis  to 
institutes  in  universities.  As  shown  in  table  2  nearly  three  quarters  (73.8%)  of  the  young  scientists  (age  <  40)  have  access  to 
the  internet.  Among  the  older  scientists  (age  a  40)  65.6%  dispose  of  a  direct  access.  If  we  fcx;us  on  those  people  in 
uni  versiues  who  dispose  of  a  direct  access  to  the  internet  the  percentage  of  frequent  users  among  young  scienUsLs  is  nearly 
the  same  as  among  older  scientists.  65.9%  of  the  young  scientists  make  frequent  use  of  intemet-senices  compared  to  64.0% 
of  older  scientists.  Thus  the  adc^ption-rates  in  the  two  age-groups  are  almost  identical.  If  there  was  an  effect  of  age  on 
adoption  wc  would  expect  a  much  higher  adoptic^n-rate  among  the  younger  scientists.  We  can  conclude  from  these  findings 
that  difierences  in  the  percentage  of  internet-users  between  age-groups  are  only  the  result  of  differences  in  access-rates. 
Presumably  older  people  gel  access  to  the  internet  at  a  later  stage  of  the  innov  ation-prcx;ess  than  younger  people.  But  if  there 
is  a  direct  access  to  the  internet  the  same  fraction  of  researchers  w  ill  use  the  internet  in  the  older  and  in  the  younger 
generation.  This  indicates  thai  the  use  of  inlemet-serMces  is  already  a  valuable  research  tcx)l  and  that  it  depends  mosllv  on  the 
institutional  context  in  which  the  scientist  works  whether  he  adopts  internet-services  or  does  not.  But  we  can  expect  that  in 
the  near  luturc  the  use  of  inlcmel-scn  ices  will  be  as  natural  as  that  of  personal  computers  is  now.  Therefore  archives  have 
started  to  prepare  themselves  for  the  coming  interact  age. 
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Access 

Frequent  use    Ratio                 N 

(use/access) 

under  40  years                           73.8 
40  years  and  older                    65.6 

48.6                    65.9                     (107) 
42.0                    64.0                    (212) 

only  university  Institues 

Table  2:  Acxess  and  use  of  the  internet  by  age 

Desiderata  and  Recommendations  of  the  ZA  Clientele 

At  the  end  of  the  inten  lew  we  asked  the  respondents  if  there  was  an>thing  that  the  Central  Archive  should  improve  or  which 
services  should  be  inu-oduced.  A  large  fraction  of  the  respondents  commented  on  the  inibrmation  policy.  They  wanted 
information  to  come  more  frequentlv  and  more  directly  to  their  working  place.  Another  group  recommended  to  gi\e  more 
detailed  information.  Some  researchers  gave  the  advice  to  foster  the  effort  of  addressing  people  outside  the  core-disciplines 
of  Social  Sciences.  The  second  main  topic  w  as  the  dissemination  of  data.  Many  of  the  user^  wanted  to  ha\  e  quick  and  easy 
access  to  the  data  via  FTP  and  to  have  more  data  sets  made  available  on  CD-ROM.  Some  mentioned  the  present  pncing 
policy  and  expressed  their  w  ish  for  reduced  charges  for  data  access.  As  the  third  main  topic  some  users  pointed  to  topically 
focused  data  collections  and  to  a  better  and  easier  access  to  international  data.  Some  researchers  would  be  glad  if  we  could 
offer  more  surveys  from  the  field  of  commercial  market  research  and  if  we  could  offer  more  recent  data 

Facilitating  Access  to  Comparative  Data 
Using  the  Internet  and  Publishing  on  CD-ROM 

The  central  archn  e  has  alwa\s  made  the  effort  to  expand  its  sen  ices  and  to  use  new  technology  to  disseminate  data  and  to 
commumcate  with  its  clientele  as  shown  in  the  first  chapter  of  this  paper.  In  response  to  the  answers  the  researchers  gave  in 
the  user  survey  the  central  archive  will  strengthen  these  efforts.  We  w ill  spread  information  about  new  data  sets  and  other 
relevant  news  through  a  mailing  lisL  More  detailed  and  always  up-to-date  information  can  be  found  on  our  web  pages  (http:// 
www.zauni-koeln.de/)  just  now  and  will  be  developed  further  in  the  near  future.The  question  text,  codebook  information  and 
marginal  distributions  of  the  International  Social  Survey  Program  (ISSP)  e.g.  are  searchable  in  the  Internet  under  WAIS. 
Soon  data  will  be  accessible  by  PTP-Transfer.  Furthermore  we  will  enlarge  our  collections  of  data  sets  available  on  CD- 
ROM.  Third  we  are  engaged  in  a  multinational  project,  named  ILSES  which  aims  at  the  development  of  an  integrated  library- 
and  dataservice.  Finalh,  we  have  installed  a  scientific  laboratory  equipped  with  all  the  infrastructure  needed  for  comparative 
research.  Also,  the  European  Data  Archiv  es  are  creating  a  virtually  integrated  catalogue  of  their  holdings,  accessible  via 
Internet. 

Social  Research  Labs  /  Large  Scale  Facilities 

As  we  start  aging  m  the  virtual  scientific  community  we  learn  that  the  dream  of  information  and  data  traveling  to  any  place  in 
the  world  is  becoming  true,  yet  it  does  not  pro\  ide  the  ideal  research  environment  for  comparative  research.  Researchers 
may  be  well  informed  about  major  events  in  their  societies  that  might  have  had  an  impact  on  attitudes  and  behavior  of 
respondents.  The  further  we  progress  in  time,  the  more  interprelaUon  knowledge  must  be  transferred  to  the  collective  memory 
of  researchers  in  order  to  provide  the  context  that  was  decisive  in  the  phase  of  data  collection.  This  is  p)articularly  relevant  for 
information  about  other  scx;ieties  which  are  not  part  of  the  daily  information  routine  of  the  researcher. 

Contextual  information,  cultural  background  and  historic  know  ledge  which  may  be  necessary  for  sound  interpretation  of 
empincal  e\  idence  do  not  automatically  u^av  el  with  the  collection  of  data  sets  from  different  socieUes  .  Bringing  together 
relevant  data  is  still  an  exercise  in  systematic  selection  of  comparable  variables,  data  rectxiing  and  overcoming  transboarder 
data  How  hurdles  emanating  from  data  protection  and  data  access  regulations. 

A  response  to  the  needs  of  comparative  research  may  be  scx;ial  science  data  labs,  in  which  all  relevant  data  and  information 
for  a  particular  research  field  is  at  the  fingertips. Over  the  past  two  years  ZA  has  created  a  EUROLAB  w  hich  provides  access 
to  major  comparative  studies  and  related  background  matcnal  (e.g.  party  manifestos,  media-reports,  event  data  bases,  fact 
bcxjks  etc.).  The  stud\  collections  include  among  others  the  Inlemauonal  Scx;ial  Survey  Program,  the  Eurobarometers  and 
major  election  studies  on  national  parliaments  in  Europe. 

The  Standing  Committee  for  the  Social  Sciences  of  the  European  Science  Foundation  had  pointed  to  the  need  for  better 
integration  of  the  European  data  base  and  brought  to  the  attention  of  the  European  Union  that  social  science  data  bases  are  the 
equivalent  to  large  scale  research  instruments  of  the  natural  and  techmcal  sciences.  A  study  panel  proposed  to  acknowledge 
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the  need  of  social  research  for  Large  Scale  facilities  where  researchers  not  normally  having  access,  could  come  to  profit  from 
a\  ailablc  resources.  The  Institute  for  Social  Sciences  in  Essex  and  the  Zentralarchiv  in  Cologne  received  recognition  as  first 
Large  Scale  Facilities  in  Europe  under  the  Training  and  Mobility  Program  of  the  EU.  This  will  allow  to  cover  travel  and 
subsistence  costs  for  scholars  from  EU  member  and  associated  states  who  want  to  make  use  of  these  resources  subject  to 
appro\  al  of  their  applications. 

Ch  er  the  ne.xt  three  years  this  will  allow  the  ZA  to  have  scholars  and  research  teams  not  only  making  use  of  the  resources,  but 
at  the  same  time  enjoying  truely  comparatn  c  research  by  bnnging  together  their  specific  knowledge  about  different 
countnes.  Thus  they  can  help  to  validate  data  and  background  material.  Ultimately  this  will  impncive  the  research  resources 
for  the  scientific  community  at  large,  since  \alidated  data  and  background  information  may  be  compiled  in  knowledge  basis 
for  general  distribution. 
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Comments  on  the  Data  Access  and  Dissemination  System 
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The  Data  Access  and  Dissemination  System  (DADS)  will  be  the  vehicle  for  dissemination  of  census  data  in  the  year  2000. 
Information  distributed  m  published  census  volumes  in  1990  will  be  accessed  from  the  internet  for  future  censuses. 
Complete  data  files  will  no  longer  be  wntten  to  media  for  redistribution  to  users.  Instead,  users  will  access  DADS  and  pull 
off  the  tables  they  need.  The  advantages  to  the  U.S.  Census  Bureau  and  its  customers  are  quicker  turnaround  for  release  of 
files,  cost-effectiveness,  and  increased  access. 

Several  factors  have  probably  motivated  the  Census  Bureau  to  make  this  move.  First,  aJl  federal  agencies  are  responding  to 
Al  Gore's  call  for  internet  access  by  January  1,  1996.  Second,  changes  in  technology,  such  as  the  development  of  the 
internet,  high  speed  computers,  and  low-cost  storage  have  made  this  method  of  distribution  feasible.  In  the  past  year,  we 
have  witnessed  an  explosion  in  the  number  of  websites  that  distribute  data  (e.g.  PSID,  HRS/AHEAD,  National  Longitudinal 
Surveys,  IPUMS,  Milwaukee  Parental  Choice  Program,  Wisconsin  Longitudinal  Study,  Russian  Longitudinal  Monitoring 
Survey,  World  Fertility  Sur\'eys,  Malaysia  Family  Life  Survey,  Survey  of  Families  and  Households.)  Finally,  distnbuting 
information  via  DADS  is  a  cheaper  alternative  for  the  Census  Bureau,  particularly  when  compared  with  the  cost  of  printing. 

As  with  any  change,  however,  there  are  probably  people  who  were  better  served  by  the  old  dissemination  methods  than  they 
will  be  by  DADS.  It  is  clear  that  the  Census  Bureau  wants  this  system  to  serve  all  users.  However,  there  are  some 
shortcomings  that  should  be  solved  in  upcoming  renditions  of  DADS  if  the  Census  Bureau  is  to  reach  that  goal. 

The  Data  Access  and  Dissemination  System  doesn't  exist  yet.  It  is  still  a  concept  However,  I  will  use  the  "Data  Access" 
page  on  the  Census  Bureau's  Web  site  provides  a  good  working  model  of  DADS;  and  much  of  it  is  likely  to  be  incorporated 
into  the  future  operational  DADS.  It  is  also  likely  that  many  features  of  this  current  system  will  be  remodeled,  so  some  of 
my  comments  may  be  "old  news"  to  the  inner  circle  of  DADS  developers. 

The  current  configuration  of  DADS  needs  three  important  improvements.  First,  DADS  should  provide  the  same  information 
that  one  could  get  using  the  old  dissemination  methods.  The  data  ma\'  be  m  a  different  form  than  they  were  in  the  past,  but 
the  content  of  the  data  must  not  be  compromised.  When  the  PSID  changed  from  a  family/individual  file  with  a  reocrd  length 
approaching  32,767  to  its  new  form  of  family  records  and  individual  records,  the  same  information  was  still  available.  It 
takes  new  knowledge  to  work  with  the  data,  but  users  can  still  create  the  same  sorts  of  tables  they  could  create  in  the  past.  In 
contrast,  the  current  configuration  of  DADS  does  not  allow  users  to  create  all  the  tables  they  could  in  the  past.  Second, 
DADS  should  accommodate  the  users  who  have  access  to  high-speed  computers  and  large  amounts  of  disk  space.  Some 
users  would  like  to  have  the  data  on  their  own  systems,  rather  than  requesting  tables  and  extracts  from  DADS.  The  FTP 
access  of  raw  files  is  weak  m  the  Data  Access  site.  If  FTP  access  to  raw  files  proves  to  be  impossible  because  of  the  need  to 
protect  respondent  confidentiality,  then  the  extraction  system  should  be  improved.  Finally,  and  perhaps  foremost  in  the 
minds  of  data  libranans  and  archivists,  there  is  the  question  of  whether  DADS  will  meet  the  archival  needs  of  future  users  of 
census  data.  Expertise  on  and  access  to  state  and  federal  records  lends  to  be  fairly  short-lived.  Thus,  it  is  essential  that  the 
archival  needs  of  future  researcheers  be  considered  in  the  development  of  DADS.  What  sorts  of  records  will  be  turned  over 
to  the  National  Archives  and  in  what  form? 

Loss  or  Information  with  DADS 

Most  asers  of  summary  tape  files  (STF)  find  the  summary-level  sequence  charts  confusing.  However,the  current 
configuration  of  the  Data  Access  system  does  not  make  it  clear  that  all  the  choices  available  in  a  typical  summary  tape  file 
are  available  in  the  new  system.  Users  can  get  tabulations  for  states,  counties,  metropolitan  statistical  areas  (MSAs),  tracts, 
and  blocks  — the  most  typical  choices.  But  can  they  get  tabulations  for  central  cities  of  MSAs  (summary  level  340)  or  for  any 
of  the  Amencan  Indian  Reservation  categonzatioas  (summary  levels  210-221)?  What  about  county-specific  zip  code 
staustics  (summary  level  820  versus  summary  level  810)? 

The  way  the  Data  Access  system  is  currently  configured  some  items  in  the  geographic  identification  section  are  not  accesible. 
Occasionally  users  need  the  longitude  and  latitude  or  land  area  of  census  tracts  or  blocks  for  the  computation  of  a  summary 
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measure  such  as  a  residential  segregation  index.  However,  users  can't  select  these  items,  or  other  items  such  as  consolidated 
cit\  populauon  size  code,  place  class  code,  or  place  descnption  ccxJe  from  this  section. 

DADS  works  best  when  users  are  making  a  request  for  a  small  number  of  tables  for  a  small  number  of  geographic  units.  Rir 
example,  this  system  works  well  for  a  user  who  wants  to  know  the  population  size  (a  single  cell)  for  all  counties  in  Michigan 
or  the  dislnbution  of  income  in  Houston,  Texas.  However,  many  analysts  need  perhaps  10  or  12  tables  (which  might  mean 
200  or  even  1 ,000  cells)  for  all  zip  codes  in  the  nation.  To  get  these  data  from  DADS  in  its  current  configuration,  one  must 
list  all  the  relevant  zip  code(s).  Typing  in  over  10,000  zip  ctxies  is  not  a  very  practical  alternative!  In  a  typical  STF  request 
based  on  data  stored  on-line,  one  would  select  the  appropnate  summary  level  for  zip  code  (820)  and  would  get  all  the  tables 
for  all  zip  codes  with  the  execution  of  one  job.  The  configuration  for  block  groups  and  tracts  is  similar,  but  one  only  has  to 
highlight  the  tracts  or  block  groups  rather  than  typing  them  out.  DADS  can  handle  these  requests  for  summary  statistics  for 
all  zip  codes  or  census  tracks  in  the  U.S.,  but  it  is  a  very  labor-intensive  task  for  the  requester.  The  analyst  who  makes  this 
sort  of  request  is  not  just  mining  data  that  are  never  analyzed.  The  analyst  is  reducing  3 1  ,CX)0  columns  of  information  into 
200- 1  ,(XX)  columns  and  then  making  the  request  for  a  unit  of  analysis  that  might  range  from  around  3,(X)0  for  counties  to 
more  than  100,000  for  bl(x.-k  groups. 

I'm  certain  that  the  future  DADs  will  allow  access  to  all  summary  tape  file  information.  However,  so  far,  only  summary  tape 
files  1  and  3  were  released  in  CD-ROM  form.  Thus,  none  of  the  race-specific  tabulations  from  STF2  and  STF4  are  available 
under  the  current  Data  Access  system. 

One  would  hope  that  DADS  would  allow  the  Census  Bureau  to  eliminate  the  distinction  between  summary  tape  files  and 
public  use  microdata  files.  It  would  be  very  useful  for  researchers  to  be  able  to  define  their  own  tables  rather  than  being 
restricted  to  the  limited  number  of  tables  supplied  by  the  Census  Bureau  in  its  summary  tape  files.  We  had  some  researchers 
at  Michigan  recently  wanted  to  look  at  disability  according  to  race  and  sex.  However,  our  researchers  needed  an  age 
breakdown  other  than  the  typical  18-64  and  65+  groupings.  Because  the  table  they  needed  was  not  available  in  an  STF  file, 
they  created  one  using  PUMs  files.  Using  PUMs,  however,  ga\  e  them  a  smaller  case  base,  and  the  census  geography  could 
not  be  perfectly  duplicated.  In  general,  if  analysts  make  a  table  or  summary  statistic  based  on  a  small  number  of  vanables, 
they  should  be  able  to  get  the  tabulation  for  any  level  of  geography.  However,  if  they  want  to  use  15  vanables  to  define  a 
summary  slatistlic,  then  the  level  of  geography  becomes  much  more  restricted  (state,  MSA,  or  PUMA).  Thus,  another 
advantage  of  eliminating  the  distincUon  between  summary  tape  files  and  public  use  microdata  files  would  be  the  increased 
sample  size  for  the  public  use  microdata  files.  Small  populations  such  as  male  clencal  workers,  female  pilots,  50  to  54  year- 
old  women  with  own  children  under  5  years  of  age,  or  persons  bom  in  Guam  could  be  studied  better  with  a  16%  count  rather 
than  the  5%  files  available  with  public  use  microdata.  (On  the  other  hand,  I  shudder  to  think  that  some  of  our  researchers 
would  be  tempted  to  try  to  swallow  the  16%  coimt  of  white  pnme-age  males  when  even  the  5%  count  proves  to  be  fairly 
cimibersome.) 

FTP  Access  andi/or  Improving  the  Extraction  Engine 

Researchers  who  have  excellent  computing  facilities  may  not  want  to  get  in  the  DADS  queue  every  time  they  want  access  to 
census  data.  If  the  demand  for  the  system  is  great,  the  Census  Bureau  may  want  to  allow  for  FTP  access  so  that  users  who 
have  the  capacity  can  bypass  DADS  except  for  quick  exploraUon  and  for  FTP  access  to  the  onginal  raw  files.  If  for  reasons 
of  confidentiality,  the  Census  Bureau  cannot  provide  access  to  the  raw  files— perhaps  becau.se  confidentiality  is  built  into  the 
DADS  software  rather  than  into  the  data  (via  sample  size  or  census  geography)— then  the  DADS  system  needs  to  make 
improvements  to  the  existing  extraction  engine. 

The  systems  developed  by  CIESIN  (Ulysses)  and  Public  Data  Quenes  (Explore)  are  extremely  fast.  One  of  the  reasons  they 
are  so  fast  is  that  they  produce  is  tables  instead  of  the  cases  and  vanables  used  to  produce  the  tables  (or  summary  statistics). 
Any  time  one  wntes  out  individual  records  rather  than  tables  or  summary  statistics,  response  time  slows  precipitously.  If 
many  users  want  micro-level  extracts,  as  opposed  to  exploratory  tables  or  even  output  from  a  summary  tape  file  request 
(which  IS  always  a  good  example  of  data  reduction),  the  response  Ume  will  begin  to  discourage  and  imlate  users.  If  a  user 
has  the  capacity  to  handle  the  raw  files,  the  Census  Bureau  should  allow  the  user  to  do  so,  and  thereby  free  up  ume  for  users 
who  need  the  CPU, 

The  creators  of  the  Integrated  Public  Use  Microdata  Samples  (IPUMS)  have  found  that  their  data  cannot  be  used  by  all  who 
might  be  interested  in  them,  partly  because  of  the  sheer  size  of  the  files  ( 125G)  and  partly  because  their  pnmary  audience 
(histonans)  traditionally  has  had  limited  access  to  powerful  workstaUons.  Thus,  the  IPUMS  creators  developed  an  extraction 
system  that  allows  a  somcvs hat  disenfranchised  user  to  create  a  work  file.  (These  users  are  not  completely  disenfranchised  as 
they  do  have  access  to  the  internet.)  However,  response  time  will  not  be  quick  with  the  IPUMS  data  extraction  system 
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because,  at  least  for  the  short  run,  aJl  extracts  will  be  executed  on  a  single  Sparc20.  Clearly,  a  user  with  access  to  large  disk 
storage  and  a  pow erful  processor  will  be  better  off  running  the  extract  at  his/her  ow n  desktop.  Of  course,  the  calculus  needed 
to  figure  out  whether  the  extract  should  be  executed  by  the  IPUMS  workstation  or  a  local  workstation  is  complicated  by  the 
fact  that  other  products,  such  as  an  extract  codebook  and  SPSS  cards  are  created  along  with  the  IPUMS-based  extract 

Researchers  at  my  site,  the  Population  Studies  Center  at  the  University  of  Michigan,  have  made  countless  extracts  since  the 
release  of  the  1990  PUMs  files  using  an  in-house  program  that  rectangulanzes  the  hierachical  structure  of  these  files. 
Turnaround  time  is  relatively  quick  (45  minutes  -  3  hours)  depending  on  the  sample  being  used  (\%,  3%,  5%,  8%),  number 
of  states  requested,  the  size  of  the  file  being  wntten  out,  and  the  load  on  the  system.  We  purchased  most  of  the  microdata 
from  the  Census  Bureau  for  $4,800  (5%  -  $4,000  and  1  %  -  $800).  1  don't  have  a  count  of  the  number  of  extracts  performed 
over  the  past  few  years  but  conservatively  it  has  been  500  which  works  out  to  be  $10  an  exuact  and  is  more  likely  to  be  over 
a  1000.  DADS  will  not  be  able  to  provide  this  quick  and  cost-effective  system  for  our  users  although  many  users  will  be 
ecstatic  about  the  system  that  DADS  will  provide. 

Improving  the  Extraction  Engine 

Researchers  often  need  access  to  more  information  than  the  tabular  data  provided  through  the  STF  data  extiaclion  system. 
The  need  for  exploratorv  analysis  can  be  met  w  ith  tabular  data  and  summary  statistics;  and  more  time  spent  explonng  data 
before  analysis  often  means  less  information  actually  ends  up  being  extracted  because  the  user  has  a  much  better  idea  of  what 
is  needed  for  the  actual  analysis.  However,  researchers  often  need  access  to  microdata  so  that  they  can  estimate  equations. 
The  systems  developed  by  CIESIN  (Ulysses)  and  Public  Data  Queries  (Explore)  allow  researchers  and  policy  analysts  to  get 
means  and  crosstabs  from  PUMs  data  in  a  matter  of  seconds;  but  not  all  statistical  needs  can  be  met  with  simple  means  and 
crosstabulations.  The  Census  Bureau  is  aware  of  all  of  this  and  provides  a  Data  Extraction  engine  for  microdata.  However, 
in  the  case  of  hierarchical  files  similar  to  census  microdata  (CPS),  the  intert'ace  for  extraction  is  quite  awkward.    The 
interface  needs  to  be  improved,  particularly,  if  for  reasons  of  confidentiality,  access  to  microdata  is  limited  to  the  Census 
Bureau  extracuon  engine. 

Currently,  the  extraction  procedure  requires  usere  to  extract  the  records  separately  by  record  type  (household,  family,  person) 
even  though  almost  all  users  want  a  rectangular  product  Although,  the  user  certainly  can  merge  the  household,  family,  and 
person  records  to  create  a  rectangular  file  there  does  not  seem  to  be  a  rationale  for  adding  this  extra  step  to  the  procedure.  In 
addtion,  merging  across  record  types  increases  the  possibility  that  a  novice  user  will  end  up  with  an  erroneous  file  and  not 
realize  it.  Novice  users  would  also  benefit  from  features  such  as  vanables  that  provide  counts  across  the  household,  (e.g.  the 
number  of  children  under  4  or  the  number  of  earners)  and  the  option  to  rectangulanze  the  record  based  on  something  other 
than  record  type  (e.g.,  rectangulanze  by  household  relationship  for  husband/wife  or  a  mother/child  file).  Another  common 
request  is  to  select  all  person  records  if  any  person  in  a  household  meets  a  certain  catena,  such  as  foreign  birth, 
unemployment,  age  6(3+ ,  or  interetate  migration.  Of  couree,  the  more  bells  and  whistles  that  are  added  to  the  extraction 
engine,  the  more  likely  it  is  that  people  will  use  it  for  data  management  rather  than  just  data  access. 

Archival  Issues 

The  final  quesuon  that  a  system  like  DADS  invokes  is  how  it  can  be  archived.  How  will  the  Census  Bureau  unpack  DADS 
so  that  they  can  turn  over  raw  data  and  a  codebook  to  the  National  Archives?  Will  there  even  be  a  codebook  if  the  Census 
Bureau  intends  not  to  disseminate  raw  files  and  technical  documentation  as  it  did  in  the  past?  If  the  confidentiaJity  firewall  is 
built  into  the  software,  how  can  this  information  become  part  of  the  raw  data  so  that  confidentiality  requirements  continue  to 
be  fulfilled?  Much  of  DADS  sounds  dynamic,  which  suggests  that  the  system  will  be  updated  to  include  more  data  and 
perhaps  that  vanables  will  be  recoded  to  meet  the  demands  of  users.  At  what  point  will  DADS  be  stabilized  so  that  there  is 
an  archival  record? 

Table  1  has  a  list  of  questions  that  can  help  provoke  our  thinking  and  serve  as  guidelines  in  determining  who  should  be 
responsible  for  making  an  archive  out  of  DADS.  Given  the  complexity  and  enormity  of  DADS,  we  may  be  tempted  to  allow 
the  Census  Bureau  to  be  the  archive  for  the  census  of  2000  and  for  all  future  data  products,  particularly  because  the  Census 
Bureau  looks  like  the  archival  expert  when  compared  to  the  National  Archives  on  many  of  these  quesuons.  However,  it  is 
important  to  remember  that  most  state  and  federal  data  producers  have  pcxjr  long-term  memones  about  old  data  (someumes 
the  definition  of  old  is  just  a  few  years)  and  that  there  has  been  a  lack  of  institutional  memory  within  the  Census  Bureau 
about  previous  data  losses  due  to  pcx)r  archival  policy.  An  article  by  Dollar  nicely  summanzes  the  histoncal  record  of  federal 
data  producers  and  the  NaUonaJ  Archiv  es.  In  the  decision  on  w  hether  to  archiv  e  summary  statistics  \  ersus  microdata  from 
the  1940  census  the  reasoning  was  "if  the  Gov  emment  agency  that  created  the  records  for  statistical  purposes  did  not  fully 
exploit  them,  it  is  hardly  likely  that  anyone  else  will."  (Dollar,  198x:  79).  Thus,  1940  microdata  were  expendable. 


Table  1 

Who  Should  be  Responsible  for  Data 

( 1  )Is  iherc  expertise  in  the  creating  agency  that  can  explain  the  context,  technicalities  of  the  subject  area,  or  the 
idiosyncrasies  of  the  data  which  would  not  be  available  if  the  records  were  transferred  to  an  archive?  Will  that 
expertise  remain  available  for  all  electronic  records,  or  only  for  those  in  active  systems. 

(2)What  functionality  of  the  system  used  to  create  the  records  is  necessary  to  meet  the  needs  of  archival  users?  Can 
the  archives  provide  the  necessary  degree  of  functionality,  or  is  the  creating  agency  the  only  economically  or 
technologically  feasible  place  to  preserve  the  data  in  a  usable  format? 

(3)Will  the  creating  agency  guarantee  equitable  access  within  freedom  of  information  and  confidentiality  policy 
guidelines? 

(4)Do  the  records  have  continuing  value  to  the  creating  agency  so  that  it  has  an  interest  in  and  need  to  maintain  the 
records  beyond  an  external  requirement? 

(5)Will  there  be  a  duplication  of  effort  if  the  archives  acquire  electronic  records  that  have  continuing  value  to  the 
onginating  office? 

(6) Where  will  the  nsk  of  loss  or  destruction  be  minimized? 

(7)Can  the  creating  agency  guarantee  that  it  will  stabilize  and  not  alter  the  archival  record? 

(8)Do  regulations  prohibit  transfer  of  records  from  the  custody  of  the  original  agency? 

(9)What  is  the  total  cost  to  the  organization  to  maintain  electromc  records  for  accountability  and  research  purposes? 
How  can  these  costs  be  reduced  for  the  institution  as  a  whole,  without  eliminating  services  to  users? 

Source:  Hedstrom,  Margaret  1991.  "Archives:  To  Be  or  Not  to  Be:  A  Cominentar;."  Archives  and  Masenm  Informatics,  TechnicaJ 
Report,  Namlwr  13. 
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Social  Science  Data  Services  During  the  Last  Five  Years  of  the 

iVliilenniuni:  Developments  in  the  Delivery  and  Support  of  Data  Services  for  Academic 
Research  in  Europe  and  North  America. 


bv  Adam  Luhanski', 

Information  Systems  Manager 

ESRC  Centre  for  Economic  Performance 

Introduction,  Aims  and  Background 

In  general,  data  libranans  are  supported  by  researchers  and  computer  staff  in  their  view  that  demands  for  data  services  are 
hkely  to  mcrease.  Even  where  there  have  been  technology  related  sa\ mgs*,  other  technology  based  tasks  ha\ e  ansen  to  add  to 
the  number  of  tasks  performed  by  data  support  services  -  for  example,  the  management  and/or  construction  and  maintenance 
of  Web  interfaces  to  data  and  associated  documentation  and  literature. 

The  case  for  investing  in  data  support  services  may  seem  clear  to  members  of  organisations  such  as  1  ASSIST,  CSS  and  Cause. 
And  in  the  more  recent  reports  produced  by  researchyteaching  support  funding  bodies  such  as  the  UK's  ESRC  and  JISC  there 
has  been  a  marked  increase  in  references  to  the  data  support  environment.  The  main  aim  of  this  paper  is  to  see  if  there  is  some 
empincal  basis  for  the  claim  that  investment  in  the  continued  development  of  data  support  services  is  worthwhile.  The 
establishment  of  this  claim  will  provide  a  sound  basis  from  which  to  present  the  likely  development  scenanos  of  academic 
data  services  up  to  2000. 

The  background  to  this  study  is  associated  with  observations  of  a  number  of  trends  in  institutional  policy  in  the  broad  areas  of 
social  science  support  and  administrauon  in  Eurojsean  and  North  American  umversities  and  associated  research  centres.  These 
trends  include: 

•  increasing  funding  pressures  on  researchers  and  research  supervisors  to  speed-up  submission  rates  -  for  example, 
from  1998,  the  UK's  Economic  and  Social  Research  Council,  (ESRC)  will  only  fund  PhDs  at  institutions  where  60% 
or  more  students  submit  their  PhDs  within  4  years  (currently,  this  is  set  at  50%)^; 

•  the  development  of  insututional  measures  to  ensure  researchers  (and  teachers)  have  necessary  data  resources  and 
appropriate  information  systems  (IS)  infrastructure  -  about  70%  of  US  academic  institutions  claim  to  have  IS 
strategies"*;  and 

•  changes  in  IS  infrastructure  which  have  affected  the  resource  demarcation  between  hitherto  autonomous  entities  - 
for  example,  the  integration  of  some  or  all  of  audio-visual,  computer,  data,  librar\',  network  and  telecommumcations 
services'. 

To  this  can  be  added  societal  changes,  such  as  the  nse  of  so-called  meritocratic  practices  such  that  "good  policy"  requires  that 
position/status  and  associated  resourcing  have  some  empincal  basis,  as  in  evidence-based  planning  requirements'". 

The  first  set  of  facts  gathered  in  this  study  relates  to  the  current  and  projected  growth  rates  in  empirical  research.  Whichever 
way  these  are  indicated,  this  growth  is  dramauc.  The  results  of  three  methods  of  assessing  empincal  trends  are  summansed  as 
follows: 

•  article-content  analysis  shows  a  consistent  growth  in  the  proportion  of  empincally  based  journal  articles  (Oswald, 
1992;  Figho,  1994;  s'ugler,  1995;  and  Piatt,  1996)'; 

•  data  access  enhancements,  p)articularly  tho,se  associated  with  networking  and  interfacing,  continue  to  speed-up  the 
process  of  acquinng  data  and  associated  bibliographic  references  -  for  example,  BlRON,  BLS,  IBSS,  ICPSR  and  many 
local  developments  such  as  the  data  subsetting  services  at  the  NBER,  SSDC  and  CEPIS*;  and 

•  IT  enhancements  (storage  and  processing)  have  enabled  major  increases  in  productivity  -  recorded  in  studies  of 
empirical  research  outputs  (CEP/LSE)  and  business  productiv  ity  measurement  (MIT's  CCS)'. 

The  Fulbnght  Study  is  the  basis  for  the  second  part  of  this  inv  estigation.  It  captures  data  support  expenences  from  three 
perspectiv  es:  research,  data  services  and  FT/computer  suppmrt.  On  the  issue  of  efficiency  of  research  and  m-house  data 
support,  views  are  summansed  as  follow  s: 

•  the  researcher-teacher  view  (28/30)  is  that  data  support  (local  and  central)  is  an  essential  component  of  an  efficient 
research  environment  -  but,  according  to  some  ( 10/30),  this  may  not  be  for  ever; 
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•  the  data  support  sen  ice/person  view  (25/27)  is  that  data  and  information  services  are  expenencing  a  major  upturn 
in  demand- from  tx>th  research  and  teachingaclivi  ties  (4  respondenlsalso  CI  ted  an  mcrease  in  administration  demands 
for  data  advice);  and 

•  the  majonly  IT/computer  support  service/ person  view  ( 17/26)  is  that  the  acquisition  of  information  and  data  support 
skills  has  become  vital  to  their  career  prospects  -  others  fell  that  networking  and  leaching  support  together  with  some 
integration  of  audiovisual  support  appeared  a  more  fruitful  path. 

What  seems  obvious  to  the  stakeholders,  howe\er,  may  not  be  fully  recognised  by  ihe  funders  and  planners.  Ultimately, ;'// 
tlie  long  run,  data  support  funding  will  be  determined  by  economic  cnlena. 

The  bad  news  at  the  present  time,  is  that  many  data  services  are  not  well  placed  within  the  order  of  things  to  ensure  that  their 
strong  economic  arguments  are  well  represented. 

The  go(xl  news  is  the  data. 

Background  Data  on  Data  Support  Services 

The  selection  of  ninety  or  so  interviewees,  split  about  equally  by  the  above  types,  was  based  on  publication-citation  methods 
(Gutman  Library,  February  1995).  Bnetly,  this  method  adopted  the  following  research  sequence; 

Data  was  captured  mainly  during  30-40  minute  interviews  (dunng  some  thirty  visits  to  North  Amencan  research  institutions, 
March  to  May  1995);  additional  data  was  gathered  from  preliminary  Internet  searches  and  follow-up  email  to  inteniewees  - 
typically,  clarification  of  interview  notes. 

Supplementary  environmental  evidence  was  gleaned  from  institutional  policy  documents  -  as  they  related  to  data  services/ 
support,  and  collected  during  the  study.  These  included  institutional  responses  from  LSE,  ESRC,  national  and  state  archives, 
data  suppliers  (e.g.  the  BLS)  and  a  sample  of  North  Amencan  universities. 

The  Sample  Population: 


Interviewees 

Researcher 

Data  Support 

IT  Support 

Total  interviewed 

30 

27 

26 

Female! 

6 

18 

5 

Male 

24 

9 

21 

Job 

9SRAs 

(Senior  Research 
Assistant/OITicer 

8  Data  Librarians 
3  Data  Archivists 

7  IT  Assistants 

13  Professors 

3  Data  Consultants 

14  IT  Managers 

Research  Center  -  8 
(Data  Center  =  7) 
University  =  20 
(17  Research) 

8  Directors 
(I.e.  Directors  of 
Research  Centers 
8  (of  400  fie) 
5  (of  100  fte) 

2  Info.  Managers 
4  Res.  Managers 
7  Data  Managers 

5  IT  Directors 

8  (of  20  fte) 
7  (of  22  fte) 

8  (of  14  fte  +  IT) 
7  (of  15  lie) 

17  (of  ?  lie) 

12  (of  25  fte) 

11  (of?  lie) 

Published 

27  (IBSS) 

15  (Cause/effect, 
lASSIST,  IBSS) 

8-1- (Cause/el  feet 
lASSIST,  IBSS) 

Cited 

(Size/scale  -  average) 
Research  Center 
Data  Center 
University  -  research 
University  -  teaching 

23(181) 
90?  citations 

80  fie 
50  fte 
5,000s-(-?f 
6,(X)0s-t-?f 

12  (Cause/effect, 
lASSIST  -  approx. 

2.5  fte 
2  fte 
2  fte 
1  fte 

Not  counted 

2  fte 

3  fte 

30  lie  (LSE) 
30  fte  (LSE) 

1.  .\n  empirical  basis  for  the  prominence  of  women  in  computer-based  data  support  is  reported  in  .Anderson,  RE.,  1987 

■'  Denotes  thai  figures  were  not  noted  at  lime  of  interview 

(.411  figures  are  for  economic  scxial  sciences  FF  Suppiirt  mcludes  networking  and  systems  stalT ) 
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Of  seventeen  research  univet^iities  visited,  virtually  all  ( 16)  had  a  level  of  local  data  support  far  higher  than  that  found 
(informally)  in  the  UK.  The  one  uni\  ersity  which  did  not  claim  to  have  any  formal  anangemenLs  for  data  support  did, 
however,  provide  a  very  competent  IT  and  computing  advist>ry  service  together  with  a  catalogued  tape  library  facility.  Basic 
ad\  ice  about  dataset  management  tasks  was  given  by  a  Program  Advi.sory  team  which  referred  detailed  quenes  to  an  "analyst 
programmer  with  e.\penence  of  databases". 

Of  the  sixteen  research  universities  claiming  to  provide  a  "resourced  data  service",  four  were  classified  as  having  basic  data 
support'" ;  nine  were  classified  as  having  intermediate  data  support";  and  three  fitted  the  classification  of  full  data  support'-. 


Level  of  Data                          Basic  Data               Intermediate  Full  Data 

Support Service Data  Service Service 


Research 
Universities  (17) 

Research 
Centers  (8) 


•  of  all  forty-eight  respondents  interviewed  at  the  17  research  universiUes,  45  exprcs.sed  unprovoked  favourable  opinions 
of  data  support  services  -  although  nearly  all  said  this  was  an  under  resourced  area  (and  were  actively  lobbying  for  better 
funding);  only  two  researchers  said  that  their  level  of  data  support  was  adequate  for  Uxal  needs;  and 

•  about  half  of  the  data  support  services/libranes  were  managed  by  the  library  service  -  a  trend  which  was  generally 
welcomed,  but  opposed  by  4  respondents  (3c  and  Id  -  i.e.  three  computer  stall  and  one  data  support  person)  who  favoured 
independence. 

Of  seven  research  universities  with  large  independently  funded  economic/social  research  centers  -  i.e.  similarly  configured  to 
LSE  and  CEP: 

•  all  seven  had  data  support  facilities  (often  named  data  libranes)  both  centrally  based  -  typically,  managed  by  the  library 
( 1)  or  IT  services  di\  isions  (2),  sometimes  independent  (4)  -  and  devolved  in  the  research  centers  themselves  (data  from 
interviews  held  at  Harvard/NBER,  Pnnceton/OPR,  Comell/CISER,  Syracuse/CPR,  Wi.sconsin/SSC,  Ohio/NLSY  and 
Michigan/ICPSR); 

•  both  central  and  devolved  models  of  data  support  appeared  to  function  and  coordinate  well  (according  to  interviewees), 
and  were  as.sociated  with  high  levels  of  researcher  (and  support  staff)  satisfaction;  and 

•  all  seven  researchers  (all  experienced  professors)  interviewed  had  recently  visited  research  universities  in  the  UK 
(typically,  the  LSE  and  one  or  two  others),  and  expressed  some  dismay  at  the  pcxjr  level  of  data  and  IT  support  facilities 
for  researchers  -  although  conventional  UK  academic  library  facilities  were  rated  highly. 

Teaching  universities  provided  data  .services  through  the  library  and  IT/computer  sen'ices.  The  VP  of  one  university 
descnbed  an  "innovative  plan"  for  creating  information  (and  data)  support  teams  attached  to  academic  departments  and 
managed  by  the  Library  SerAice.  Each  team  would  compnse  a  "subject  librarian",  a  "computer/network  adviser"  and  an  "A V- 
graphics-teaching  resources  manager". 

Overall,  although  a  few  data  staff  had  major  resenations,  this  sort  of  reorganisation  -  the  CLIO  Mcxlel  -  was  expected  to  be  a 
feature  of  the  IS  future  in  teaching  institutions.  Of  nine  such  support  staff  inten  lewed,  all  looked  forward  to  re-defined  jobs, 
some  with  enthusiasm  (5)  others  with  apprehension  (4). 

Resources  and  costs  associated  with  data  support 
Facilities  (IT  infrastructure) 

All  respondents  seemed  aware  of  the  major  time-savings  enabled  by  technological  advances.  In  particular,  re.searchers  were 
keen  to  cite  benchmarks  for  vanoas  mtxlelling  and  statisucal  tasks.  The  following  examples  are  typical  of  a  dozen  or  so 
proffered.  These  were  provided  by  an  Industrial  Relations  re.searcher  (LSE  and  DTI)  and  a  trade/productivity  research 
economi.st  at  ESRC's  CEP: 
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Year  -  Stata  v3 

Machine 

Cost  (new) 

Time  (approx.)    \ 

1985                          ! 

PC-XT 

$=£1,400 

6,000  sees             j 

1990 

386DX20 

$=£2,000 

500  sees 

1993 

486DX33 

$=£2,000 

90  sees 

1995 

Pentium90 

$=£2,000 

23  sees                  ' 

1996 

PentiumPro200 

$=£3,000 

7  sees 

1996 

Sun  Model20-71 

$=£7,000 

30  sees*                1 

*  Sun  will  run  two  identical  jobs  in  less  than  twice  this  time  (actually,  51  sees). 

The  following  times  correspond  to  the  running  time  of  the  same  Gauss  program  solving  a  non-linear  equation  system  for  a 
gnd  of  points. 


Year- 

Gauss 

Machine 

Cost  (new) 

Time  (approx.)     i 

1993 

486DX66 

$=£2,200 

68  sees 

1995 

Pentium90 

$=£2,000 

19  sees 

1996 

Dell  Latitude 
(laptop  PI 20) 

$=£2,600 

17  sees 

1996 

1996  Pentium  Pro 

$=£3,000 

7  sees                     i 

1996 

Sun  Model20-60 

$=£7,000 

20  sees* 

*  Unix  times  cannot  be  guaranteed  if  multi-user. 

Typical  hardware  platforms  found  in  the  research  centers  included  several  Unix  boxes  (HP,  IBM  and  Sun)  and  about  one 
486/Pentium  per  fte  researcher  (excluding  part-time  postgraduates  who  typically  shared  pooled  386/486  facilities). 

•  Novell  (stable),  NT  (expanding)  and  Unix  (stable)  network  servers  were  typical  -  486/Pentium  servers  (1  Gb  to  5Gb)  and 
Unix  cluster  (5  to  50Gb)  were  typical  storage  capacities 

•  A  typical  mid-sized  research  center  had  52  dos/windows  PCS,  10  Macs  (Classic),  3  Unix,  1  VMS  and  2  Novell  servers  - 
supporting  about  200  postgraduate  students  and  40  fte  research  staff 

•  use  of  campus-wide  email  (with  approx.  allocation  of  1Mb  space  per  user)  was  typical  in  research  centers  -  as  opposed  to 
own  email  installation 

•  most  common/popular  packages  were  WordPerfect/ Word,  Netscape/Mosaic,  ELM/Pine/cc:Mail,  Gauss,  Excel/ 123/ 
Quattix)Pro,  SAS/SPSS/Stata  -  little  evidence  of  the  use  of  programming  languages  such  as  FORTRAN  and  C. 

Most  interviewees  reported  major  changes  in  the  pattern  of  IT/computer  support.  For  example,  the  central  Program  Advisory 
Service,  still  prevalent  in  many  UK  universities,  had  all  but  disappeared  in  the  US  institutions  in  the  Fulbnght  Study. 
Typically,  programmers  had  been  relocated  and  redesignated  as  departmental  or  cluster  IT  support  staff.  In  the  department, 
expenenced  programmers  were  often  expected  to  prov  ide  a  w ide  range  of  skills,  covenng  softw are  and  hardv\are  installauon 
as  well  as  teaching  support  duties.  Some  common  responses  to  this  major  structural  change  were  as  follows: 

The  majority  of  "ex  analyst-programmers"  expenenced  what  they  saw  as  a  "deskilling  process"  -  a  minonty  were  opumistic 
about  the  challenge/value  of  learning  new  skills.  The  majority  of  this  group  expressed  disquiet  over  "cost  recovery"  policies, 
and  some  expected  this  to  lead  to  their  exnnction. 

Some  data  staff  felt  that  lone  researchers  in  particular  had  lost  a  \'aluable  resource,  the  program  ad\  isor.  Nearly  all  data  staff 
said  they  now  found  it  necessary  to  prov  ide  some  programming  support  for  basic  data  management  tasks  -  typically,  SAS, 
SPSS  and  Stata.  Just  over  half  of  all  data  staff  inteniewed  (i.e.  14  of  27)  appeared  familiar  with  one  or  other  of  these 
programs  -  most  of  these  said  they  had  always  .seen  data  management  programming  as  part  of  their  remit,  although  the\  also 
reported  less  demand  for  detailed  program  advice. 
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In  turn,  nearly  all  IT  support  staff  (23  of  26)  reported  concern  that  their  skills  needed  to  be  upgraded  (19),  or  had  already  been 
upgraded  (4),  to  cope  with  new  information  and  data  management  tasks.  Many  computer  staff  reported  a  major  decline  in  the 
demand  for  their  programming  skills,  and  some  said  they  had  stopped  all  programming  activity  "many  years  ago". 

The  following  IT/computer  issues  were  cited  frequently: 

•  around  one  third  of  data  staff  stressed  that  "computer  skills  were  a  basic  reqiurement  for  data  support  staff'  (10d=10 
citauons) 

•  some  YT  support  staff  (below  managers)  were  concerned  that  IT  managers  were  not  offering  appropnate/relevant  training 
for  IT  staff,  particularly  data  skills  (5c) 

•  the  use  of  public  PC  facilities  by  students  was  often  100%  with  queuing  at  peak  times,  indicating  that  demand  exceeded 
supply  -  although  some  umversities  expenenced  a  decrease  in  use  of  public  facilities,  as  students  stayed  off -site  (long 
journeys,  bad  weather,  good  support  for  modem  links  or  local  networking,  etc.  encouraged  purchase  of  laptops) 

The  1997-2000  IT  Outlook 

As  predicted  by  Richard  Rockwell  (I ASSIST,  1993),  more  powerful  personal  desktop  PC- 
Workstations  (runmng  Unix  and  Windows  NT/95),  have  continued  to  enable  researchers  to 
process  large-scale  datasets  extracted  from  local  and  wide  area  networks.  All  respondents  in 
the  Fulbnght  Study  expected  conUnued  performance  improvements  in  desktop  processing 
and  data  management,  enabling  further  gains  in  research  output.  There  continues  to  be 
general  optimism  about  the  contribution  of  IT  hardware  and  software  advances  and  their 
contribution  to  greater  research  productivity .  The  demand  for  IT  support  of  remote  laptop  and 
home  computing  (distance  learning)  is  exp)ected  to  increase,  stimulated  by  the  growth  in 
quality  teaching  software.  In  the  light  of  so  much  concern  expressed  about  reorgamsation  of 
support  services,  we  may  expect  a  professional  review  of  all  research  support  services.  IT, 
library  and  data  support  staff  will  take  the  initiative  (data  piloted)  and  produce  a  more  user- 
onented  inlormation  service. 

Data  Consultancy/Services  (locaJ  data  support) 

According  to  researchers  and  data  staff,  the  delivery  of  data  supp>ort  has  become  far  more  proactive.  All  data  staff  provided 
examples  of  "going  out  there"  to  find  datasets,  to  advise  on  the  best  use  of  the  data  service  (and  other  larger  data  facilities) 
and  to  help  construct  enhanced  senices  through  interlinked  Web  pages. 

Library  based  data  staff  were  most  enthusiastic  about  the  contnbution  of  CD-ROM  based  datasets;  some  others,  particularly 
expenenced  data  support  staff,  seemed  sceptical  about  the  ultimate  value  of  this  form  of  data  dissemination. 

Researchers,  closely  followed  by  everyone  else,  were  perplexed  by  the  management  and  demarcation  CD-ROM  data 
(typically  supported  by  the  library  service)  and  data  on  other  media  (typically  found  in  data  centers).  This  was  generally  put 
down  to  some  form  of  historical  delerminism,  and  there  was  little  evidence  of  plans  for  change  in  this  respect! 

Researchers  and  data  staff  repxjrted  that  Internet  tyjje  enhancements  to  data  services  had  become  expensive  to  maintain. 
ExpectaUons  were  high,  following  the  early  lead  taken  (voluntanly)  by  data  staff  in  constructing  useable  interfaces  to 

datasets. 

Web  weavers  reported  time  costs  between  2  hours  and  two  days  per  week  for  basic  to  comprehensive  coverage  of  data 
services.  Much  of  this  work  had  been  undertaken  without  additional  funding.  Data  and  research  staff  had  become  de  facto 
Web  Advisors. 

Invanably,  data  support  staff  expressed  "grave  concerns"  about  data  security  and  quality  -  particularly,  in  environments  of 
decreased  IT  support  services. 

The  following  data  issues  were  ated  frequently: 

•  researchers  in  research  groups/centers  appeared  less  interested  in  programming  support  -  although  lone  researchers  still 
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needed  help  (4d+lr=5  citations) 

•  researchers  seemed  more  concerned  about  qualit>'  of  data  accessibility,  particularly  with  respect  to  speed 
(21r+l(Xl+6c=31  citations) 

•  data  staff  and  some  IT  staff  were  troubled  by  the  ease  of  passing  on  large-scale  undocumented  (or  poorly  d(K:umented) 
dataseLs  (22d+l2c=34  citaUons) 

•  a  small  number  of  cxpencnccd  researchers  were  concerned  about  a  possible  decline  in  the  quality  of  data  analysis  -  due 
to  the  U-end  to  increase  in  accessibility  (2r) 

•  European  data  could  be  difficult  to  locate,  and  often  impossible  to  acquire  (8d+9r=  17  citations) 

•  the  majont)  of  researchers  prefer  to  download  data  directly  to  their  personal  machines,  using  their  own  data  checking 
skills  (18r) 

•  some  data  bureau  seem  reluctant  to  develop  user  services  -  so  ICPSR  (gcxxl  at  data  checking)  were  playing  an  essential 
role  (5r) 

•  the  majonty  of  researchers  preferred  to  download  entire  file  -  rather  make  "front-end  decisions"  -  even  in  the  case  of 
ver\  large  datasets  (21r) 

•  researchers  were  keen  to  support  the  central  university  data  repository'  -  saying  good  kxal  a\ailability  was  important  to 
research  producuvity  (23r) 

•  about  half  of  the  experienced  researchers  inteniewed  said  they  liked  to  send  their  research  assistants  to  the  IT/data  center 
-  the  other  half  tended  to  seek  assistance  directly  from  IT  and  data  staff  as  appropnate 

The  1997-2000  Data  Outlook 

General  expectation  of  increased  researcher  self-sufficiency  (with  network  infrastructure 
and  data  support)  in  programming/computing  tasks.  Alongside  this,  more  effort  in  enabling 
access  to  datasets  through  the  Internet.  Later  rather  than  sooner  (evidence  suggests)  someone 
brave  will  pull  CD-ROM  data  together  with  other  data  media.  Expectation,  in  the  long  run, 
of  large  investment  in  distnbuled  data  services  \ia  Web/lntemet  -  economisLs/accountants 
will  work  with  data  staff  network/communications  staff  and  higher  education  planners  to 
produce  properly  resourced  infrastructure.  In  the  mean  time,  data  staff  will  continue  to 
produce  prototypx;  Web  Data  Servers  without  prof)er  funding,  and  to  exp)enment  with  the 
linking  of  dataseLs,  dcxrumentation  and  bibliographic  information.  Many  data  staff  will 
change  from  being  de  facto  Web  Advisors  lode  jure  Information  Managers. 

Data  Archives  and  Data  Services 

•  a  significant  number  of  data  support  staff  (and  others)  were  concerned  about  small-scale  institutions  -  in  particular,  their 
inability  to  manage  and  afford  big  dataseLs  (7d-i-5r-t-4c) 

•  even  in  larger  institutions,  data  staff  said  that  if  funding  problems  persist,  central  archives  such  as  ICPSR  would  become 
still  more  important  (3d) 

•  a  few  rr  support  staff  and  researchers  stated  their  preference  for  getting  data  directly  from  central  large-scale/national 
archiving  (Essex,  ICPSR,  Roper,  etc.)  which  might  assure  the  quality  and  security  of  important  datasets  (4c+2r) 

•  some  expenenced  researchers  appeared  keen  to  get  data  direct  from  source,  and  to  bypass  both  local  and  national  data 
services  and  archives  (6r) 

•  researchers  and  data  staff  based  in  specialist  research  centers  expected  to  play  a  major  role  as  data  resource  centers, 
claiming  the  "full  set  of  research,  data  and  computing  expertise"  at  the  necessan  level  of  expertise  to  advise  specialist 
research  projecLs  (6  of  7r  -t-  7  of  7d  -t-  5  of  7c) 

The  1997-2000  Data  Archive  Outlook 

There  was  some  expectation  of  devoluUon  of  large-scale  data  archives  -  major  research 
universiues  and  research  centers  will  negotiate  to  get  data  asscx'iated  with  their  specialism 
direct  Irom  source.  Specialist  research  centers  will  work  with  major  archives  to  distnbute 
dataseLs,  associated  matenals  and  expert  advice  to  high  level  re.search  projects.  Data  archives 
will  conunuc  to  distnbute  dataseLs  to  the  majonty  of  non-specialist  institutions,  and  to 
provide  some  further  "one-stop-shop"  support  for  institutions  unable  to  resource  a  local  data 
service.  Data  archives  will  combine  with  national  data  services  and  social  science  inJonnation 
gateways  to  lead  the  management  and  ccxirdination  of    specialised  data  services  and 
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associated  expertise  based  al  universities  and  research  centers.  At  an  international  level,  they 
will  plan  and  manage  the  network  of  specialist  data  servers,  and  they  will  jointly  work 
towards  making  national  datasets  statistically  comparable. 

Structural/organisational  trends 

Whilst  the  majonty  of  researchers  and  data  staff  supported  de\  elopmenLs  in  the  integration  of  data  serv  ices  and  libranes, 
some  had  major  reser\'ations  -  ciUng  loss  of  autonomy,  deskilling  and  reduced  service  as  likely  outcomes. 

IT  support  devolution  and  cost  recoN'ery  continued  to  alarm  support  staff  and  over-exercise  IS  managers.  It  appears  that  every 
institution  has  completed  or  is  considenng  a  major  reshaping  of  teaching  and  research  support  services. 

Most  researchers  (17)  supported  the  development  of  "one-stop-shopping"  -  i.e.  the  integration  of  IT  and  data  services  - 
although  some  (4),  typically  expenenced,  researchers  questioned  whether  e.xtending  the  ranges  of  skills  might  dilute  the 
expertise.  One  expenenced  researcher  and  a  few  (4)  data  staff  viewed  integration  plans  as  "cosmetic",  and  counter-productive 
in  that  expenenced  data  staff  were  likely  to  be  lost  or  become  disaffected  in  the  transition. 

Integration  of  data  and  library  catalogues  was  fully  supported.  About  half  of  data  staff  ref)orted  that  datasets  and  library 
records  "have  been  or  are  being  fully  integrated";  a  third  said  it  was  "being  plarmed";  and  the  others  said  they  "expected 
integration  of  catalogues  to  happen  soon". 

Most  researcher-teachers  ( 18  of  2 1  interviewed)  supported  "the  trend"  to  deliver  research  (project)  based  courses  to 
undergraduates  using  "real  datasets". 

Some  "research-led  teachers"  discussed  the  need  for  a  more  effective  Information  Systems  structure  to  enable  appropriate 
support  for  courses  which  required  a  range  of  data  and  information  inputs  together  with  more  advanced  information 
management  and  processing  techmques  -  cf.  courses  which  employ  artificial  intelligence  methods".  In  this  scenano  the 
populanty  of  the  one-stop-shop  was  ver\'  evident 

Network  and  Communications  remain  central  sen  ices  -  albeit  with  evidence  of  growth  in  the  number  of  local  Servers.  The 
move  towards  full  integration  of  \ oice  and  network  services  continues  -  reaching  o\ er  50'7c  in  research  uni\ersities'''.  About 
half  of  the  institutions  visited  charged  for  ethemet  (or  token  nng)  connections,  and  some  added  a  rental  charge  -  $130  for 
network  installation  (+  $5  additional  annual  rental  in  some)  was  t\-pical. 

There  were  reports  of  an  increase  in  (hitherto  flatish)  demand  for  remote  computing  which  IT  support  staff  expected  to 
further  stretch  their  reduced  (in  real  terms)  resources.  Teachers,  students  and  researchers  are  already  expecting  computer 
advice  from  remote  locations  -  i.e.  from  home,  conference  locations,  etc). 

The  1997-2000  Structural  Outlook 

The  continued  devolution  of  large-scale  central  IT  services  seems  likely,  although  a  few  of 
the  very  successful  central  systems  should  be  able  to  construct  a  professional/economic  case. 
Professional  groups  such  as  1  ASSIST  and  CSS  will  cooperate  to  produce  documentation  of 
"models  of  successful  research  supf>ort  systems".  Joint  work  on  teaching  supf)ort  will  enable 
teachers  to  deliver  remote/distant  learning  courees  using  real  datasets  extracted  from  central 
archives  (for  general/introductory  courses)  and  specialist  data  servers  (foradvanced  courses). 

Problem  areas 

Data  staff  were  seriously  concerned  over  a  numlier  of  data  .secunty  issues.  All  experienced  staff  (23  of  27)  said  they  had 
initiated  (8  of  23)  or  were  initiating  ( 10  of  23)  or  wouldy'should  initiate  (5  of  23)  procedures  for  data  checking  in  light  of  bad 
dataset  transfers. 

Large  scale  data  transfers  using  FTP  were  commonly  cited  as  enor  prone",  and  bad  Windows  transfers  (via  File  Manager, 
particularly  from  CD-ROM)  and  tape  backups  were  also  reported. 

The  following  problems  were  cited  frequentiy: 
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•  Proliferation  of  forms  (8d) 

•  a  few  staff  mentioned  the  importance  of  getting  away  from  the  "format  statement",  which  was  seen  as  problematic  for 
researchers  and  time  consuming  for  support  staff  (3d+  lc+2r)) 

•  variable  extraction  via  Web  interfaces  was  generally  supported  -  but  there  were  some  fears  that  speedy  extraction  would 
mean  misuse  of  data,  particularly  if  "data  alerts"  were  not  built  into  the  system  (5d) 

•  researchers  complained  about  the  work  overload  at  ICPSR  which  meant  they  had  to  plan  for  up  to  eight  weeks  delay 
from  data  order  ( \  la  ICPSR  and  local  Data  Library)  -  one  researcher  said  she  ad\'ised  colleagues  to  "order  data  on  the 
expectation  that  you  may  need  it!"  (6r) 

The  1997-2000  Problems  Outlook 

Exf)ect  "contents  to  check  contents",  i.e.  auto  check  for  data  consistency.  Data  users  to 
feedback  errors  though  speedy  "feedback  system".  Overdue  replacement  of  jjaper  forms  by 
electronic  forms.  Data  staff  will  make  a  professional  case  for  greater  in\estment  in  the 
integration  of  metadata  with  dalasets,  and  experienced  researchers  will  advise  Web  Data 
Server  designers  on  the  attachment  of  appropriate  data  documentation  to  subsets. 

Success  factors  and  performance  indicators  associated  with  data  support 

All  researchers  inteniewed  showed  a  keen  av^areness  of  research  technologies  and  their  contribution  to  research  productivity. 
About  half  said  they  were  sceptical  of  Windows  style  GUIs,  but  all  research  respondents  said  that  productivity  gains  from 
advances  in  operating  systems  (for  example,  multi-tasking  and  large  memory  management)  had  made  a  major  contnbution  to 
their  (and  others)  empincal  research.  About  a  third  (9  of  30)  volunteered  detailed  benchmark  figures  consistent  with  those 
cited  earlier  in  this  p)aper. 

Most  respondents  said  that  the  quality  of  output  was  higher  due  to  both  IT  and  data  support  (roughly  equally,  when 
prompted).  Two  respondents  (senior/expenenced  researchers)  said  their  own  research  benefitted  ver>  little  from  local  data 
services  and  a  great  deal  from  IT  support  -system  programmers  advice.  In  one  case,  local  data  services  did  not  feature  at  his 
instituDon  -  he  tended  to  use  highly  skilled  systems  programmers  to  assist  with  data  management  tasks.  Six  researchers 
argued  that  their  research  could  not  be  undertaken  properly  without  assistance  from  local  "highly  skilled  data  support  staff. 

As  might  be  expected,  productivity  issues  cited  by  data  staff  in\anably  reflected  the  content  of  the  recent  lASSIST 
Newsletter/Journal  coverage.  The  importance  of  documentation  featured  in  all  interviews.  Content  analysis  of  resf)ondence 
to  an  open-ended  question  on  "what  matters  most?"  shows  the  following  recent  articles  to  be  representative  of  the  range  of 
issues  cited:  for  example,  general  issues  covered  in  Rasmus.sen,  1995,  on-line  codebooks  by  Sheih  1995,  quality  and 
accessibility  by  Beedham  1995,  production  by  Winstanley  and  standards  b\  Greene  1995. 

IT/computer  support  staff  were  much  more  likely  than  researchers  and  data  staff  to  mention  shortfall  m  training,  both  m 
terms  of  their  own  needs  and  the  requirements  of  end-users. 

Data  staff  were  most  concerned  about  getting  additional  resources  for  new  developments  such  as  the  delivery  of  subsetting 
services  and  related  documentation. 

The  Fulbnght  Study  showed  a  strong  association  between  high  levels  of  local  data  support  and  good  performance'*.  Nearly 
all  researchers  were  keen  to  empathise  w  ith  the  data  services  view  that  local  data  senices  are  a  key  factor  in  the  production 
of  highly  cited  research  publications. 

Unprompted,  over  half  of  all  respondents  expected  data  support  to  be  a  major  component  of  developing  teaching  methods, 
particularly  new  and  redesigned  undergraduate  courses. 

There  are  also  strong  a  priori  grounds  for  associating  data  support  with  good  research  performance.  The  evidence  for  growth 
in  quantity  and  qualit\  of  empirical  research  is  ver\  strong,  and  it  is  also  clear  that  academics  are  rewarded  for  research 
performance  measured  by  publications  and  citations. 

The  variables  that  most  distinguished  the  academics  in  the  sample  who  had  been  promoted  from  those  w  ho  had  not  included 
rate  of  publication  in  refereed  journals,  level  of  citaUon.  research  grants  applied  for  and  obtained  and  the  number  of  PhD 
students  under  a  person's  supervision.  Likelihcxxl  of  promotion  was  correlated  negatively  with  self-reported  commitment  to 
teaching'''. 
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The  1997-2000  Resource  Outlook 

Expect  the  organisation  of  local  research  support  to  be  investigated  more  ngorously  with  a 
view  to  expansion  in  light  of  its  proven  contnbution  to  research  and  teaching  prtxiuctivity. 

Some  economjc  conclusions 

One  thing  is  for  certain  in  this  study:  Researchers,  data  senices  managers  and  IT  staff  all  feel  the  funding  future  to  be 
uncertain.  While  they  may  ha\ e  clear  visions  of  the  what  the  direction  of  research  support  services  ought  to  be,  they  arc 
nervous  about  policy-making. 

The  bad  news  is  that  this  concern  is  well-founded.  Most  researchers  and  virtually  all  research  support  staff  (outside  the  library) 
are  badly  placed  (in  the  order  of  things)  to  make  a  big  impact  on  resource  policy.  And,  as  \  irtualK  e\  ery  interviewee  in  the 
study  has  mentioned,  failure  to  compete  professionally  for  the  appropriate  level  of  resources  will  not  enable  their  data  Utopia 
to  become  reality  -  or  even  virtual  reality: 

The  quite  good  news  is  that  they  do  have  lots  of  real  data  on  the  productivity  benefits  of  data  support  to  enable  the 
construction  of  a  strong  case  for  further  investment.  They  aJso  have  the  skills  to  disseminate  this  evidence.  What  is  required  is 
a  framework  for  e\  aluating  the  contnbution,  and  for  this  they  may  need  to  find  time  to  review  the  small  but  growing  literature 
in  information  economics.  Recent  work  on  the  economics  of  the  Internet  and  on  the  contribution  of  IT  to  business  ma\  provide 
some  clues  as  to  what  to  look  for  and  what  to  measure  in  the  context  of  research  inputs. 

As  in  other  areas,  the  returns  to  investment  in  research  suppxart  sen  ices  can  be  measured  in  terms  of  productivity  changes, 
performance  and  consumer  benefits. 

A  number  of  recent  research  papers  claim  that  investment  in  FT  is  assoaated  with  increased  productivity,  increased  consumer 
benefits  but  unchanged  business  performance'*.  According  to  Brvnjolfsson  (1993),  these  results  are  compatible  with 
conventional  economic  theory,  i.e.  "...  firms  are  making  the  IT  investments  necessary  to  maintain  competitive  parity  but  are 
not  able  to  gain  comjsetitive  advantage". 

Productivity  gams  to  business  and  benefits  to  consumers  due  to  investment  in  IT  have  been  found  to  be  strong.  However,  the 
impact  of  IT  on  Business  Performance  seems  to  be  slight,  sometimes  negative.  It  appears  from  a  stream  of  IT  literature  on 
business  performance  (1989-1993,  cited  by  Hitt)  that  firms  are  unable  to  increase  their  profits  through  IT  investment;  indeed, 
while  IT  may  be  creating  enormous  value,  it  may  simultaneously  be  intensifying  competition  and  enabling  entry,  and  thus 
lowenng  pnces. 

The  really  good  news  for  data  suppxart  services  is  that  their  contnbution  to  empirical  research  is  truly,  widely  and  deeply 
recognised. 

It  is  time  to  invest  some  of  the  energy  and  enthusiasm  of  research  and  teaching  support  services  into  the  production  of  an 
empmcally  based  case  for  expansion. 
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